language model
Unbounded cache model for online language modeling with open vocabulary
Recently, continuous cache models were proposed as extensions to recurrent neural network language models, to adapt their predictions to local changes in the data distribution. These models only capture the local context, of up to a few thousands tokens. In this paper, we propose an extension of continuous cache models, which can scale to larger contexts. In particular, we use a large scale non-parametric memory component that stores all the hidden activations seen in the past. We leverage recent advances in approximate nearest neighbor search and quantization algorithms to store millions of representations while searching them efficiently. We conduct extensive experiments showing that our approach significantly improves the perplexity of pre-trained language models on new distributions, and can scale efficiently to much larger contexts than previously proposed local cache models.
An AI image generator for non-English speakers
Although text-to-image generation is rapidly advancing, these AI models are mostly English-centric. Researchers at the University of Amsterdam Faculty of Science have created NeoBabel, an AI image generator that can work in six different languages. By making all elements of their research open source, anyone can build on the model and help push inclusive AI research. When you generate an image with AI, the results are often better when your prompt is in English. This is because many AI models are English at their core: if you use another language, your prompt is translated into English before the image is created.
- Europe > Netherlands > North Holland > Amsterdam (0.27)
- Asia > Singapore (0.05)
LightRNN: Memory and Computation-Efficient Recurrent Neural Networks
Recurrent neural networks (RNNs) have achieved state-of-the-art performances in many natural language processing tasks, such as language modeling and machine translation. However, when the vocabulary is large, the RNN model will become very big (e.g., possibly beyond the memory capacity of a GPU device) and its training will become very inefficient. In this work, we propose a novel technique to tackle this challenge. The key idea is to use 2-Component (2C) shared embedding for word representations. We allocate every word in the vocabulary into a table, each row of which is associated with a vector, and each column associated with another vector.
Fully Neural Network Based Speech Recognition on Mobile and Embedded Devices
Real-time automatic speech recognition (ASR) on mobile and embedded devices has been of great interests for many years. We present real-time speech recognition on smartphones or embedded systems by employing recurrent neural network (RNN) based acoustic models, RNN based language models, and beam-search decoding. The acoustic model is end-to-end trained with connectionist temporal classification (CTC) loss. The RNN implementation on embedded devices can suffer from excessive DRAM accesses because the parameter size of a neural network usually exceeds that of the cache memory and the parameters are used only once for each time step. To remedy this problem, we employ a multi-time step parallelization approach that computes multiple output samples at a time with the parameters fetched from the DRAM.
Unsupervised Text Style Transfer using Language Models as Discriminators
Binary classifiers are employed as discriminators in GAN-based unsupervised style transfer models to ensure that transferred sentences are similar to sentences in the target domain. One difficulty with the binary discriminator is that error signal is sometimes insufficient to train the model to produce rich-structured language. In this paper, we propose a technique of using a target domain language model as the discriminator to provide richer, token-level feedback during the learning process. Because our language model scores sentences directly using a product of locally normalized probabilities, it offers more stable and more useful training signal to the generator. We train the generator to minimize the negative log likelihood (NLL) of generated sentences evaluated by a language model. By using continuous approximation of the discrete samples, our model can be trained using back-propagation in an end-to-end way. Moreover, we find empirically with a language model as a structured discriminator, it is possible to eliminate the adversarial training steps using negative samples, thus making training more stable. We compare our model with previous work using convolutional neural networks (CNNs) as discriminators and show our model outperforms them significantly in three tasks including word substitution decipherment, sentiment modification and related language translation.
- Asia > Singapore (0.05)
- North America > United States (0.04)
- Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
- Asia > Middle East > Israel > Jerusalem District > Jerusalem (0.04)
- Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (0.99)
- Information Technology > Artificial Intelligence > Natural Language (0.72)
- Information Technology > Communications > Social Media (0.69)
- Information Technology > Artificial Intelligence > Machine Learning (0.69)
Reinforcement learning applied to autonomous vehicles: an interview with Oliver Chang
In this interview series, we're meeting some of the AAAI/SIGAI Doctoral Consortium participants to find out more about their research. We caught up with Oliver Chang whose research interests span deep reinforcement learning, autonomous vehicles, and explainable AI. We found out more about some of the projects he's worked on so far, what drew him to the field, and what future AI directions he's excited about. Could you give us a quick introduction to who you are, where you're studying, and the topic of your research? I'm specializing in reinforcement learning applied to autonomous vehicles and UAVs.
- Education (0.70)
- Government (0.48)
Google built a flash-flood prediction tool using Gemini and old news reports
It's the first time that the company has used language models for this kind of thing. Flash floods are, but Google might have a novel solution. The company, a prediction tool for flash floods that uses Gemini to source data from old news reports. This is the first time it has used a language model for this type of work. Flash flood prediction models need historical data and model training that often doesn't exist.
- Marketing (0.47)
- Health & Medicine (0.32)
- Government (0.30)
- Information Technology > Communications > Mobile (1.00)
- Information Technology > Artificial Intelligence (1.00)
Adaptation to Intrinsic Dependence in Diffusion Language Models
Diffusion language models (DLMs) have recently emerged as a promising alternative to autoregressive (AR) approaches, enabling parallel token generation beyond a rigid left-to-right order. Despite growing empirical success, the theoretical understanding of how unmasking schedules -- which specify the order and size of unmasked tokens during sampling -- affect generation quality remains limited. In this work, we introduce a distribution-agnostic unmasking schedule for DLMs that adapts to the (unknown) dependence structure of the target data distribution, without requiring any prior knowledge or hyperparameter tuning. In contrast to prior deterministic procedures that fix unmasking sizes, our method randomizes the number of tokens revealed at each iteration. We show that, for two specific parameter choices, the sampling convergence guarantees -- measured by Kullback-Leibler (KL) divergence -- scale as $\widetilde O(\mathsf{TC}/K)$ and $\widetilde O(\mathsf{DTC}/K)$ respectively. Here, $K$ is the number of iterations, and $\mathsf{TC}$ and $\mathsf{DTC}$ are the total correlation and dual total correlation of the target distribution, capturing the intrinsic dependence structure underlying the data. Importantly, our guarantees hold in the practically relevant parallel-sampling regime $K
- Workflow (0.93)
- Research Report > New Finding (0.87)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- North America > Canada > Quebec > Montreal (0.04)